32 research outputs found

    A Knowledge-Based Topic Modeling Approach for Automatic Topic Labeling

    Get PDF
    Probabilistic topic models, which aim to discover latent topics in text corpora define each document as a multinomial distributions over topics and each topic as a multinomial distributions over words. Although, humans can infer a proper label for each topic by looking at top representative words of the topic but, it is not applicable for machines. Automatic Topic Labeling techniques try to address the problem. The ultimate goal of topic labeling techniques are to assign interpretable labels for the learned topics. In this paper, we are taking concepts of ontology into consideration instead of words alone to improve the quality of generated labels for each topic. Our work is different in comparison with the previous efforts in this area, where topics are usually represented with a batch of selected words from topics. We have highlighted some aspects of our approach including: 1) we have incorporated ontology concepts with statistical topic modeling in a unified framework, where each topic is a multinomial probability distribution over the concepts and each concept is represented as a distribution over words; and 2) a topic labeling model according to the meaning of the concepts of the ontology included in the learned topics. The best topic labels are selected with respect to the semantic similarity of the concepts and their ontological categorizations. We demonstrate the effectiveness of considering ontological concepts as richer aspects between topics and words by comprehensive experiments on two different data sets. In another word, representing topics via ontological concepts shows an effective way for generating descriptive and representative labels for the discovered topics

    Text Summarization Techniques: A Brief Survey

    Get PDF
    In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.Comment: Some of references format have update

    Federated Learning in Cardiac Diagnostics: Balancing Predictive Accuracy with Data Privacy in Heart Sound Classification

    No full text
    Cardiovascular diseases represent a significant global health concern, accounting for 31% of all worldwide deaths. While machine learning presents a promising avenue for early and accurate diagnosis, the associated ethical and legal challenges, especially concerning data privacy, complicate its direct application. This research paper delves into Federated Learning (FL), a decentralized method, as a potential solution to address data utility and privacy concerns. FL enables devices or servers to hold subsets of overall data, compute local updates, and relay them to a central server without transferring raw data, thus maintaining privacy. The study aims to evaluate the feasibility and efficacy of applying FL to heart disease prediction while maintaining ethical and legal standards. Prior work in this domain, particularly by Wanyong et al., utilized FL for heart sound analysis, highlighting its advantages in data privacy and decentralization. Drawing on this background, our research contributes to the dual objectives of enhancing healthcare outcomes and ensuring data privacy, setting a benchmark for the future application of machine learning in medical research

    Towards Federated Learning-based IoT Security

    No full text
    In recent years, we have witnessed the dramatic growth of mobile devices in the IoT domain, which enables people and services to interconnect and exchange information constantly. The number of IoT mobile users tends to grow larger connecting more and more people and devices. On the flip side, IoT mobile devices are subject to insecure design, implementation, or configuration. As a result, underpinning networks that are based on such devices are exposed to misuse and cyberattacks [1]. To safeguard Mobile IoT networks, Intrusion Detection Systems (IDSs) have been widely used to monitor the network traffic and identify suspicious activities within the traffic. IDS systems are often regarded as a critical component in protecting IoT nodes and networks as well as mitigating adverse effects of cyber attack targeting IoT. In general, IDSs are categorized as signature-based or anomaly-based defense mechanisms [2]. Signature-based IDSs recognize intrusions (or suspicious activities) by finding the relationship between previously learned rules/signatures of known attacks\u27 rules. Anomaly-based IDSs monitor network traffic and compare the traffic with previously learned patterns to spot malicious activities. Despite their wide adoption, IDS-based methods are, however, not very effective in detecting new and unknown adversarial attacks (signature-based IDSs are unable to detect new attacks unless they have the latest version of all attack signatures). Anomaly-based methods have shown to be able to recognize known and new attacks to some degree, but they often arise high false-positive rates hindering the accuracy [3]. Given the massive scale and heterogeneous networks of the IoT mobile devices, the effectiveness of the Intrusion Detection System (IDS) in detecting attacks is questionable. Machine Learning (ML), as cutting-edge technology for designing and implementing robust intelligent systems, has been greatly contributing to cybersecurity solutions. The past decade has witnessed the release of several approaches utilizing ML for different aspects of cybersecurity ranging from malware detection and threat intelligence to forensic investigation and privacy-preserving. Deep Learning (DL) is one of the emerging topics of ML, which is generally related to a capable learning model that includes several layers, and each layer contains enormous computational nodes. DL models have demonstrated their suitability and competency for different data-driven problems including cybersecurity. Recently, recurrence neural networks have been used in different IDS based on anomaly-detection techniques. Lately, Malhorta et.al in [4] apply Long Short-Term Memory (LSTM) for detecting anomalies in time series data. Opera et al. [5] utilize a deep network to analyze DNS log data to detect anomaly patterns in enterprise networks. The recently proposed approaches mostly operate in an off-line manner while there is a high demand for a real-time data analysis platform to detect zero-day vulnerabilities and anomaly attacks. In this project, we aim to propose a new effective anomaly detection model to differentiate benign patterns of behavior from malicious activities in mobile-based networks. To tackle the aforementioned challenges, we utilize Federated Machine Learning (FML) technique for aggregating anomaly detection patterns for IDSs. While traditional Machine Learning models mainly rely on computational power and training dataset of a centralized server, Federated Machine Learning (FML), which has gained much attention recently in different domains, is defined as a combination of federated and machine learning techniques [4]. FML implements different machine learning techniques in a decentralized environment where the machine learning models are developed based on the datasets distributed across different devices. Our work intends to use FML using PySyft framework1 to develop a global machine learningbased IDS model from many local IDS running at each IoT mobile device. While applying anomaly detection techniques in the IoT environment is associated with some challenges such as resource limitations and heterogeneity of IoT devices, and the current approaches are still dealing with those, we plan to target those issues and come up with a novel and efficient model for anomaly detection IDS using FML. In our model, the light version of the IDS (local IDS) will run on each IoT device. The local IDS will be trained, and the improved version of the model will be sent to the main server to contribute to the global IDS. In the training procedure, the IoT device collects local information, incorporates the data into the local model, refining and learning the decision boundary between the benign pattern of behavior and malicious activities. The central server, generally, average all of the updates to calculate the improved global model. Then the updated model, which is the most accurate IDS, is pushed onto the IoT mobile devices. Using FML to implement anomaly detection based IDS, enables all IoT devices to cooperate in training and improving the global model without the need of sharing their actual data. Additionally, the FML technique features the IDSs to operate in a real-time manner that we have not noticed in the current ML-based IDSs

    Computer Forensics (Undergraduate) Ancillary Materials

    No full text
    These ancillary materials for Computer Forensics (Undergraduate) are developed as part of a Round 18 Transformation Grant
    corecore